Classifier Learning for Imbalanced Data with Varying Misclassification Costs A Comparison of kNN, SVM and Decision Tree Learning

نویسنده

  • Jörg Mennicke
چکیده

This thesis theoretically discusses the abilities of three commonly used classifier learning methods and optimization techniques to copewith characteristics of real-world classification problems, more specifically varying misclassification costs, imbalanced data sets and varying degrees of hardness of class boundaries. From these discussions a universally applicable optimization framework is derived that successfully corrects the error-based inductive bias of classifier learning methods on image data within the domain of medical diagnosis. The framework was designed considering several points for improvement of common optimization techniques, such as modifying the optimization procedure for inducer-specific parameters, modifying input data implemented by an arcing algorithm, and combining classifiers of several classifier learning methods with different settings according to locally-adaptive, cost-sensitive voting schemes. The framework is designed to make the learning process cost-sensitive and enforcing more balanced misclassification costs between classes. Results on the evaluated domain are promising, while further improvements can be expected after some modifications to the framework.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Anomaly Detection Using SVM as Classifier and Decision Tree for Optimizing Feature Vectors

Abstract- With the advancement and development of computer network technologies, the way for intruders has become smoother; therefore, to detect threats and attacks, the importance of intrusion detection systems (IDS) as one of the key elements of security is increasing. One of the challenges of intrusion detection systems is managing of the large amount of network traffic features. Removing un...

متن کامل

Meta-Learning for Escherichia Coli Bacteria Patterns Classification

In machine learning area, there has been a great interest during the past decade to the theory of combining machine learning algorithms. The approaches proposed and implemented become increasingly interesting at the moment when many challenging real-world problems remain difficult to solve, especially those characterized by imbalanced data. Learning with imbalanced datasets is problematic, sinc...

متن کامل

Enhancing Learning from Imbalanced Classes via Data Preprocessing: A Data-Driven Application in Metabolomics Data Mining

This paper presents a data mining application in metabolomics. It aims at building an enhanced machine learning classifier that can be used for diagnosing cachexia syndrome and identifying its involved biomarkers. To achieve this goal, a data-driven analysis is carried out using a public dataset consisting of 1H-NMR metabolite profile. This dataset suffers from the problem of imbalanced classes...

متن کامل

Improving Accuracy in Intrusion Detection Systems Using Classifier Ensemble and Clustering

Recently by developing the technology, the number of network-based servicesis increasing, and sensitive information of users is shared through the Internet.Accordingly, large-scale malicious attacks on computer networks could causesevere disruption to network services so cybersecurity turns to a major concern fornetworks. An intrusion detection system (IDS) could be cons...

متن کامل

Fault Detection of Anti-friction Bearing using Ensemble Machine Learning Methods

Anti-Friction Bearing (AFB) is a very important machine component and its unscheduled failure leads to cause of malfunction in wide range of rotating machinery which results in unexpected downtime and economic loss. In this paper, ensemble machine learning techniques are demonstrated for the detection of different AFB faults. Initially, statistical features were extracted from temporal vibratio...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006